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Abstract. We propose a method called ideal regression for approximating an arbitrary 
system of polynomial equations by a system of a particular type. Using techniques from 
approximate computational algebraic geometry, we show how we can solve ideal regression 
directly without resorting to numerical optimization. Ideal regression is useful whenever the 
solution to a learning problem can be described by a system of polynomial equations. As an 
example, we demonstrate how to formulate Stationary Subspace Analysis (SSA), a source 
separation problem, in terms of ideal regression, which also yields a consistent estimator 
for SSA. We then compare this estimator in simulations with previous optimization-based 
approaches for SSA. 



1. Introduction 

Regression analysis explains the relationship between covariates and a target variable by 
estimating the parameters of a function which best fits the observed data. The function is 
chosen to be of a particular type (e.g. linear) to facilitate interpretation or computation. In 
this paper, we introduce a similar concept to sets of polynomials equations: given arbitrary 
input polynomials, the aim is to find a set of polynomials of a particular type that best 
approximates the set of solutions of the input. As in ordinary regression, these polynomials 
are parameterized to belong to a certain desired class. This class of polynomials is usually 
somewhat simpler than the input polynomials. We call this approach ideal regression, inspired 
by the algebraic concept of an ideal in a ring. In fact, the algorithm that we derive is based on 
techniques from approximate computational algebraic geometry. In machine learning, ideal 
regression is useful whenever the solution to a learning problem can be written in terms of 
a set of polynomial equations. We argue that our ideal regression framework has several 
advantages: (a) it allows to naturally formulate regression problems with intrinsical algebraic 
structure; (b) our algorithm solves ideal regression directly instead of resorting to less efficient 
numerical optimization; and (c) the algebraic formulation is amenable to a wide range of 
theoretical tools from algebraic geometry. We will demonstrate these points in an application 
to a concrete parametric estimation task. 
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Figure 1. Ordinary regression (left panel) fits the data (blue points) by a 
function fg (red line) parametrized by 9. Ideal regression (right panel) approx- 
imates a set of arbitrary input polynomials qi,--.,q n by a set of polynomials 
pi ,...,p e that are of a special type parameterized by 9. The approxi- 
mation is in terms of their set of solutions: the parameter 9 is chosen such 



that V(pg X \ . . . ,Pg' l> ) (red shape) best fits the the solutions to the input (blue 
shape), in the space of T\ and T2. 



More formally, the analogy between ordinary and ideal regression is as follows. In ordinary 
regression, we are given a dataset V = {(xi,yi)}f =1 C R D x R and the aim is to determine 
parameters 9 for a regression function fg : R D — > R such that f(xi) fits the target m according 
to some criterion, i.e. informally speaking, 

(1) fe(xi) ~ m for all samples (xi,yi) £ V. 

In ideal regression, the input consists of a set of n arbitrary polynomial equations qi(T) = 
■ ■ ■ = q n (T) = in the vector of variables T = (Ti, . . . , To) which may e.g. correspond to the 
coordinates of data. That is, the data set T> = {qi, . . . , q n } consists of the coefficients of the 
input polynomials. Let V(X>) C G d be the set of solutions to the input equations, i.e. 

V(X>) ={i£ € D I q(x) = OVp G V}, 

where p(x) denotes the evaluation of the polynomial p on the values x. The aim of ideal 
regression is to determine another set of polynomials pi , ■ ■ ■ ,Pg m ^ parametrized by 9 that 
best approximate the input polynomials T> in terms of their set of solutions; that is, informally, 

(2) \( Pe 1 \...J e m) )^\{V). 

The class of polynomials (parametrized by 9) by which we approximate arbitrary input is 
chosen such that it has certain desirable properties, e.g. is easy to interpret or is of a particular 
type prescribed by the context of the application. Thus, in ordinary regression we fit a 
parametrized function to arbitrary data and in ideal regression we fit a parametrized system 
of polynomial equations to arbitrary systems of polynomial equations. Note that even if V(P) 
has no exact solution (e.g. due to noise, over-determinedness, or when reducing degrees of 
freedom), we can still find an approximate regression system close to the inputs. Figure [l] 
illustrates the analogy between ordinary regression and ideal regression. 
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The natural algebraic object to parameterize sets of equations up to additive and mul- 

[71 n\ ( m \ 

tiplicative ambiguities are ideals in polynomial ringa I The parametric family p g , . . . ,p g 
corresponds to a parametric ideal J-g in the polynomial ring C [Ti , . , . , . In ring theoretic 
language, the informal regression condition reformulates to 

(3) J~0 Bapprox {qi,...,q n }, 

where 9 a pprox stands for being approximately contained. Intuitively, this means that the 
equations qi are well-approximated by the parametric ideal J-g. 

The ideal regression setting is fairly general. It can be applied to a wide range of learning 
settings, including the following. 

• Linear dimension reduction and feature extraction. When linear features of 
data are known, ideal regression provides the canonical way to estimate the target 
parameter with or without "independent" or "dependent" labels. This subsumes 
PCA dimensionality reduction, linear regression with positive codimension and linear 
feature estimation. 

• Non-linear polynomial regression. When the regressor is a set of polynomials 
with specific structure, as e.g. in positive codimensional polynomial regression or 
reduced rank regression, ideal regression allows to estimate the coefficients for the 
regressor polynomials simultaneously. 

• Comparison of moments and marginals. Ideal regression is the canonical tool 
when equalities or projections of cumulants are involved. The example we will pursue 
in the main part of the paper will be of this type, as it is possibly the simplest one 
where non-linear polynomials occur naturally. 

• Kernelized versions. Non- linear feature mapping and kernelization is natural to 
integrate in the regression process as the presented estimator builds on least-squares 
estimates of vector spaces. 
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Figure 2. Representation of the problem: the left panel shows the covariance 
matrices Si and £2 with the desired projection v. In the middle panel, this 
projection is defined as the solution to a quadratic polynomial. This polyno- 
mial is embedded in the vector space of coefficients spanned by the monomials 
X 2 , Y 2 and XY shown in the right panel. 



that is, sets of polynomials closed under addition in the set, and under multiplication with arbitrary 
polynomials 
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1.1. Example: finding common marginals. In this paper, we will demonstrate ideal 
regression in a non-linear example: we will reformulate a statistical marginalization task as 
ideal regression. Namely, we study the following problem: 

Problem 1.1. Given D-variate random variables X\, . . . ,X m , find a projection P G lR dxD 
to a d-dimensional subspace under which which the X.- L are identically distributed, i.e. 

PX X PX m . 

For example, the Xi can model different data clusters or epochs, for which we want to 
find an informative common projection P. A special subcase of this problem, where the 
Xi are approximated by Gaussians, is Stationary Subspace Analysis |23} [19] which has been 
applied successfully to Brain- Computer- Interfacing [25] > Computer Vision [IS], domain adap- 
tation [TU] , geophysical data analysis and feature extraction for change point detection [2H [2J • 
Previous SSA algorithms [231 ITD] [T2] have addressed this task by finding the minimum of an 
objective function that measures the difference of the projected cumulants on the sought-after 
subspace. 

Under the assumption that such a linear map P exists, we can describe the set of all 
maps yielding common marginals. A necessary (and in practice sufficient) condition is that 
the projections of the cumulants under P agree. Thus the coefficients of the polynomial 
equations are given by the coefficients of the cumulants of the X{ (see Section [2] for details). 
The output ideals correspond to the possible row-spans of P; note that the fact that the PXi 
have identical distribution depends only on the row-span of P. Equivalently, the regression 
parameter 9 ranges over the set all sub-vector spaces of dimension d in L>-space, i.e. over 
the Grassmann manifold Gr(d, D). The regression ideal Tq is then just 1(9), the ideal of the 
vector space 9, considered as a subset of complex D-space. 

1.2. Outline of the algorithm. In the application of ideal regression studied in this paper, 
the aim is to determine linear polynomials that have approximately the same vanishing set 
as the input polynomials derived from differences of cumulants. The representation in which 
the algorithm computes is the vector space of polynomials: each polynomial is represented 
by a vector of its coefficients as shown in the right panel of Figure [2} The coefficient vector 
space is spanned by all monomials of a particular degree, e.g. in Figure [2] the axis correspond 
to the monomials X 2 , XY and Y 2 because all homogeneous polynomials of degree two in two 
variables can be written as a linear combination of these monomials. 

The algorithm uses an algebraic trick: We first generate more and more equations by 
making the given ones more complicated; then, when their number suffices, we can make 
them simpler again to end at a system of the desired type. Playing on analogies, we thus call 
the algorithm the Miinchhausen procedural 

In the first part of the Miinchhausen algorithm, we generate the said larger system of 
equations, having a particular (higher) degree, but the same vanishing set. This is done by 
multiplying the input equations by all monomials of a fixed degree, as illustrated in the right 
panel of Figure [3j We will show that for any (generic) input, there always exists a degree such 
that the resulting polynomials span a certain linear subspace of the coefficient vector space 
(up to noise), i.e. that the red curve will always cross the blue curve. That is, we show that the 
coefficient vectors lie approximately on a linear subspace of known dimension (subspace (a) 

2 After the eponymous and semi-fictional Baron Hieronymus Carl Friedrich Miinchhausen, who purportedly 
pulled himself and his horse out of the swamp by his own hair; compare the Miinchhausen trilemma in 
epistemiology. 
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Figure 3. The left panel shows the vector space of coefficients; the input 
polynomials (red points) lie approximately on the subspace (a) of polynomials 
vanishing on S. In order to reduce the degree of the polynomial, we determine 
the intersection (orange) with the subspace (b) of polynomials that are divisible 
by a variable T\. The polynomials in this intersection are isomorphic to the 
polynomials of degree k — 1 that vanish on S. The right panel shows the 
Miinchhausen process of multiplying the set of polynomials Qi up to a degree 
so that we have enough elements | Qi | to determine the basis for the subspace 
(a). In the shown case, where we start with IQ2I = 5 input polynomials in 
D = 6 variables and dim S = 3, we need to go up to degree 5. From Q5 we 
then descend to a linear form by repeatedly dividing out a single variable as 
shown in the left panel. 



in Figure [5]) . This is the case because we have assumed that there exists a subspace of the 
data space on which the marginals agree. After this, we obtain an approximate basis for this 
subspace in coefficient space by applying PCA dimension reduction; this provides us with an 
approximate basis for subspace (a) in the left panel of Figure [3] 

In the second part of the algorithm, we reduce the degree of the system of equations, i.e. 
the approximate basis for subspace (a), by repeatedly dividing out single variable^} In each 
step, we reduce the degree by one as illustrated by the left panel of Figure [3j We compute a 
basis for the intersection (orange line) of subspace (a) and the subspace (b) of polynomials 
of degree k that are divisible by the variable T±. By dividing each basis element by T\ we 
have obtained a system of equations of lower degree, which has approximately the same set of 
solutions as the input. We repeat this process until we arrive at a system of linear equations. 

1.3. Relation to other work. The ideal regression approach draws inspiration from and 
integrates several concepts from different fields of research. The first important connection 
is with computational algebra, as the estimation procedure is essentially a ring theoretic 
algorithm which can cope with noise and inexact data. In the noiseless case, estimating 
the regression parameter is essentially calculating the radical of a specific ideal, or, more 
specifically, computing the homogenous saturation of an ideal. These tasks are notoriously 
known to be very hard in generaQ however, for generic inputs, the computational complexity 
somewhat drops into feasible regions, as implied by the results on genericity in the appendix. 



i.e. saturating with the homogenizing variable 

mamely, doubly exponential in the number of variables, see e.g. [T7] section 4.] 
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The best known algorithms for computations of radicals are those of [8], implemented in 
AXIOM and REDUCE, the algorithm of [6], implemented in Macaulay 2, the algorithm of 
|3j, currently implemented in CoCoA, and the algorithm of |16j and its modification by |17j . 
available in SINGULAR. Closely related to homogenous saturation is also the well-known 
Buchberger's algorithm for computation of reduced Grobner basis, which can be seen as the 
inhomogenous counterpart of homogenous saturation when a degree-compatible term order 
is applied. 

The second contribution comes from the field of approximate and numerical algebra, as 
the exact algorithms from computational algebra are numerically unstable even under small 
variations of the inputs and thus unfeasible for direct application to our case. The first 
application of numerical linear algebra to vector spaces of polynomials can be found in [1], 
the numerical aspects of noisy polynomials have been treated in [20J. Also, the nascent field 
of approximate algebra has developed tools to deal with noise, see [15]. In particular, the 
approximate vanishing ideal algorithm [TT] fits polynomial equations to noisy data points 
with a method that essentially applies a sequence of weighted polynomial kernel regressions. 
The estimator for ideal regression given in this paper is essentially a deterministic algorithm 
using approximate computational algebra. 

On the conceptual level, the idea of using (linear and commutative) algebra as an ingredient 
in statistical modelling and for solution of statistical problems is natural, as algebra is the 
science of structure. Therefore, studying structure in a statistical context makes algebra 
under stochastical premises a canonical tool. This idea gives rise to a plethora of different 
approaches, which today are subsumed as the field of algebraic statistics - in the broader 
meaning of the term. Standard references in the field include [21], [5] and [9], or pQ and [26] . 
Also, there is a range of machine learning methods dealing with non-linear algebraic structure 
or symmetries in data, e.g. [131 E] or [22]. In many applications, the concept of genericity 
also arises as the algebraic counterpart of statistical nondegeneracy; where it is mostly applied 
to the choice of model parameters and the study of a general such. Also, algebraic geometry 
and commmutative algebra are already successfully applied there to obtain theoretical results 
on statistical objects and methods. 

2. The algorithm 

The probability distribution of every smooth real random variable X can be fully charac- 
terized in terms of its cumulants, which are the tensor coefficients of the cumulant generating 
function. Before we continue, we introduce a useful shorthand notation for linearly trans- 
forming tensors, i.e. cumulants. 

Definition 2.1. Let A £ C be a matrix. For a tensor T E R , we will denote by 
A o T the application of A to T along all tensor dimensions, i.e. 

D D 

(A o T) ix _ ik = ^2 ■ ■ ■ ^2 Ai 1 j 1 • . . . • Aj, j, ■ 

ji=i jfc=i 

Using this, we can now define the cumulants of a D-dimensional smooth real random 
variable X via the Taylor expansion of the cumulant generating function. 

Definition 2.2. Let AT be a smooth real D- dimensional random variable. Then the cumulant 



generating function of X is defined as xx(t) = log e lT ' x J = Y^k=i^ T ) TTi where 
r £ H D . The coefficient tensors Kk(X) are called the fc-th cumulants of X. 
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For the problem addressed in this paper, cumulants are a particularly suitable representa- 
tion because the cumulants of a linearly transformed random variable are the multilinearly 
transformed cumulants, as a classical and elementary calculation shows: 

Proposition 2.3. Let X be a smooth real D -dimensional random variable and let A G IR dxD 
be a matrix. Then the cumulants of the transformed random variable A ■ X are the trans- 
formed cumulants, Kk(AX) = AoKk{X) where o denotes the application of A along all tensor 
dimensions. 



We now derive an algebraic formulation for Problem note that PXi ~ PXj if and only 
if vX{ ~ vXj for all row vectors v G span P T . 

Problem 2.4. Find all d-dimensional linear subspaces in the set of vectors 

S = {v€R D v T X 1 v T X m } 



= {v £ E, v 1 o Kk (Xi) = v 1 o « fc (Xj), 

k G M, 1 < i, j < m} 
= {v£R D | v T o ( Kk (Xi) - K k (X m )) = 0, 

k G M, 1 < i < m}. 

The equivalence of the problems then follows from the fact that the projection P can be 
characterized by its row-span which is a d-dimensional linear subspace in S. Note that while 
we are looking for linear subspaces in S, in general S itself is not a vector space. Apart 
from the fact that S is homogeneous, i.e. XS = S for all A G R \ {0}, there is no additional 
structure that we utilize. To use the tools from computational algebra, we now only need to 
consider the left hand side of each of the equations as polynomials /i , . . . , f n in the variables 
Ti, . . . ,Td, 

fj = [Ti • • • T D ] o (K k (Xi) - K k (X m )), 

with j running through n combinations of i and k. The fj are formally elements of the 
polynomial ring over the complex numbers C[Ti, . . . , Tp]. In particular, if we restrict ourselves 
to a finite number of cumulants, we can write S as the set of solutions to a finite number n 
of polynomial equations, 

S = {v G R D | hiv) = ■■■ = f n (v) = 0, 1 < j < k} , 

which means that S is an algebraic set or an algebraic variety. Thus, in the language of 
algebraic geometry, we can reformulate the problem as follows. 

Problem 2.5. Find all d-dimensional linear subspaces in the algebraic set 

S = V(f 1 ,...,f n ). 

In order to describe in algebraic terms how this can be done we need to assume that a 
unique solution indeed exists, while assuming as little as possible about the given polynomials. 
Therefore we need to employ the concept of generic polynomials, which is defined rigorously 
in the supplemental material. Informally, a polynomial is generic when it does not fulfill 
any other conditions than the imposed ones and those which follow logically. This can be 
modelled by assuming that the coefficients are independently sampled from a sufficiently 
general probability distribution (e.g. Gaussians), only subject to the imposed constraints. 
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The statements following the assumption of genericity are then probability-one statements 
under those sampling conditions. 

In our context, we can show that under genericity conditions on the fx, . . . , f n , and when 
their number n is large enough, the solution is unique and equivalent to finding a linear 
generating set for the radical of the ideal (fx,..., f n ), which equals its homogenous saturation, 
as Corollary ?? in the supplemental material asserts: 

Problem 2.6. Let fx, ■ ■ ■ , fn with n > D + 1 be generic homogenous polynomials vanishing 
on a linear d- dimensional subspace S C <C D . Find a linear generating set £\, . . . ,(-D-d f° r the 
radical ideal 

((fx,...,f n ):T D ) = ^(fx,...,f n )=l(S). 

3. Approximate Algebraic Computations 
In this section we present the Miinchhausen algorithm which computes the homogenous 



saturation in Problem 2.6 and thus computes the ideal regression in the marginalization 
problem. The algorithm for the general case is described in the appendix in section ??. 

The efficiency of the algorithm stems largely from the fact that we operate with linear 
representations for polynomials. That is, we first find enough elements in the ideal (fx, . . . , f n ) 
which we then represent in terms of coefficient vectors. In this vector space we can then find 
the solution by means of linear algebra. An illustration of this representation is shown in 
Figure [2] Let us first introduce tools and notation. 

Notation 3.1. We will write R = C[Ti, . . . , To] for the ring of polynomials over C in the 
variables Tx,... , Tp. We will denote the ideal of the d-dimensional linear space S C C D by 

5 = 1(5). 

In order to compactly write sub- vector spaces of certain degree, we introduce some notation. 

Notation 3.2. Let X be an ideal of R. Then we will denote the vector space of homogenous 
polynomials of degree k in X by X/%. 

The dimension of these sets can be later written compactly in terms of simplex numbers, 
for which we introduce an abbreviating notation. 

Notation 3.3. We denote the 6-th a-simplex number by A(a, b) = ( a+ Q _1 ), which is defined 
to be zero for a < 0. 

Since the polynomials arise from estimated cumulants we need to carry all algebraic com- 
putations out approximately. The crucial tool is to minimize distances in coefficient space 
using the singular value decomposition (SVD). 

Definition 3.4. Let A G C mxn be a matrix, let A = UDV T be its SVD. The approximate 
row span of A of rank k is the row span of the first k rows of V; the approximate left null 
space of A of rank k is the row span of the last k rows of U. 

These approximate spaces can be represented by matrices consisting of row vectors spanning 
them, the so-called approximate left null space matrix and approximate row span matrix. 

The key to the problem is the fact that there exists a degree N such that the ideal generated 
by the fx, ■ ■ ■ , f n contains all homogenous polynomials of degree N in s. This allows us to 
increase the degree until we arrive at a vector space where it suffices to operate linearly (see 
Figure [3]) . 
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Theorem 3.5. Let f\, . . . , f n E 3 be generic homogenous polynomials in D variables of fixed 
degrees d\, . . . , d n each, such that n > D. Let I = (fi,..., f n ) . Then one has 

(1 : Ti) = s 

for any variable Ti. In particular, there exists an integer N such that 

N is bounded from below by the unique index M belonging to the first non-positive coefficient 
ajif of the power series 

n?=i(i-* di ) i 



k=0 



(1-i) 



D 



{i-ty 



If we have di < 2, and D < 11, then equality holds. 



This summarizes Proposition ??, Corollary ?? and Theorem ?? from the supplemental 
material for the case s = 1(5), the proof can be found there. In the appendix, we conjecture 
that the statement on N is also valid for general di and D - this generalizes Froberg's con- 
jecture on Hilbert series of semi-regular sequences [7]. In the supplemental material, we give 
an algorithm which can be used to prove the conjecture for fixed di and D and thus give an 



exact bound for N in those cases. Theorem 3.5 guarantees that given our input polynomials, 



we can easily obtain a basis for the vector spaces of homogenous polynomials S/% with k > N. 
In this vector space, we are interested in the polynomials divisible by a fixed monomial Tf, 
they form the vector space n (Ti) = (sfl (Ti)) k ■ By dividing out the monomials Ti, we can 
then obtain the vector space of homogenous polynomials Sk-i of degree one less. 



Algorithm 1 The ideal regression estimator. Input: Generic homogenous polynomials 
fii---ifn £ 5 7 n ^ D; the dimension d of S. Output: An approximate linear generating 
set l\, ... , lr>-d for the ideal s. 



Determine some necessary degree ./V according to Theorem 3.5 (e.g. with Algorithm ??, 
or via Conjecture ??/Algorithm ??) 
Initialize Q <— [] with the empty matrix, 
for i = 1 . . . n do 

for all monomials M of degree — deg fi do 



Add a row vector of coefficients, Q 



Q ' 

fiM 



6: end for 

7: end for 

8: for k = N ... 2 do 

9: Set Q ^— ReduceDegree(Q) 
10: end for 

11: Compute the approximate row span matrix A f- 
12: Let ti^[Tx ■■■ T D ] a, for all 1 < i < D - d 



[oi • • • ctD-d] T °f Q °f rank D — d 



We explain how to proceed in the case where s = 1(S) is linear, and the fi are generic. 
The case of general s can be found in the appendix. So suppose that we have enough qua- 
dratic polynomials. Then we can determine an approximate basis for the vector space of 
homogenous degree 2 polynomials vanishing on S which is S2 , because our input polynomials 
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lie approximately on that subspace. From this we can obtain the linear homogenous poly- 
nomials Si by dividing out any monomial Tj; a basis of S\ can be directly used to obtain a 
basis of S. More generally, if we know a basis of the vector space of homogenous degree ft 
polynomials vanishing on S which is s^, we can obtain from it a basis of Sfc_i in a similar 
way; by repeating this stepwise, we eventually arrive at S\. This degree reducing procedure 
is approximatively applied in Algorithm [2j 

We now describe the Miinchhausen Algorithm [T] in detail. In Step[TJ we calculate a degree 
N up to which we need to multiply our input polynomials so that we have enough elements to 
approximately span the whole space Sat of polynomials vanishing on S. In the Steps 2 to 7 we 
multiply the input polynomials up to the necessary degree iV and arrange them as coefficient 
vectors in the matrix Q. This process is illustrated in the right panel of Figure |3j The rows of 
Q approximately span the space of polynomials vanishing on S, i.e. space (a) in the left panel 
of Figure [3j In Steps 8 to 10 we invoke Algorithm [2] to reduce the degree of the polynomials 
in Q until we have reached an approximate linear representation S\. Finally, in Step 11 we 
reduce the set of linear generators in Q to the principal D — d ones. 

Algorithm 2 ReduceDegree (Q). Input: Approximate basis for the vector space given as 
the rows of the (n x A(n, -D))-matrix Q; the dimension d of S. Output: Approximate basis 
for the vector space 5k— l, given as the rows of the (n' X A(n — 1, D))-matrix A 

1: for i = 1 . . . D do 

2: Let Qi <— the submatrix of Q obtained by 

removing all columns corresponding to 

monomials divisible by Tj 
3: Compute Li •<— the 

approximate left null space matrix of Qi 

of rank m - A(ft, D) + A(fc, d) 

+A(k-l,D)- A(fc-l,d) 
4: Compute L\ 4— the 

approximate row span matrix of LiQ 

of rank A (ft - 1, D) - A(ft - 1, d) 
5: Let L'l <— the matrix obtained from by 

removing all columns corresponding to 

monomials not divisible by Tj 
6: end for 

7: Let L <— the matrix obtained by vertical concatenation of all U! 
8: Compute A <— the 

approximate row span matrix of L of rank 

ri = min(n, £>(A(ft - 1, D) - A(ft - 1, d))) 



Given a set of polynomials of degree ft that vanish approximately S, Algorithm [2] computes 
another set polynomials of degree ft — 1 with the same property. This is achieved by dividing 
out variables approximately, in a way that utilizes as much information as possible to reduce 
the influence of estimation errors in the coefficients of the input polynomials. Approximate 
division by a variable Tj means that we find linear combinations of our input polynomials 
such the coefficients of all monomials not divisible by T, are as small as possible. Given a 
matrix of coefficient row vectors Q of degree ft, for each variable that we divide out the result 
is a matrix L" of polynomials of degree ft — 1 that also vanish approximately on the set of 
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Figure 4. Comparison of the ideal regression algorithm and the SSA al- 
gorithm. Each panel shows the median error of the two algorithms (vertical 
axis) for varying numbers of stationary sources in ten dimensions (horizontal 
axis). The noise level increases from the left to the right panel; the error bars 
extend from the 25% to the 75% quantile. 



solutions S. We iterate over all variables in the for- loop and combine the results L'(, . . . , 
in the final Steps 7 and 8. In order to find L'- for each variable Tj, we first determine a matrix 
Li that minimizes the coefficients for all monomials in Q that are not divisible by Tj. To 
that end, in Step 2 we remove all monomials not divisible by Tj to get a matrix Qi for which 
we then compute the left null space matrix Li in Step 3. The product LiQi is then a set of 
polynomials of degree k with minimal coefficients for all monomials not divisible by Tj. These 
polynomials lie approximately on the span of polynomials vanishing on S. In the next Step 4, 
we compute an approximate basis L' for this space and in Step 5 we reduce the degree by 
removing all monomials not divisible by Tj. Finally, in the last Steps 7 and 8 we combine 
all found solutions L", . . . , L" D using PCA. Note that both algorithm are deterministic and 
consistent estimators. 



4. Simulations 

We investigate the influence of the noise level and the number of dimensions on the accuracy 
and the runtime of the ideal regression algorithm in the special case of covariance matrices 
and compare it to the standard method for this case, the SSA algorithm |23j . We measure the 
accuracy using the subspace angle between the true and the estimated space of projections. 
The setup of the synthetic data is as follows: we fix the total number of dimensions to D = 10 
and vary the dimension d of the subspace with equal probability distribution from one to nine. 
We also fix the number of random variables to m = 26, yielding n = 25 quadratic polynomials. 
For each trial of the simulation, we choose a random basis for the subspace S and random 
covariance matrices to which we add a disturbance matrix parametrized by the noise level a. 

The results are shown in Figure |4| With increasing noise levels both algorithms become 
worse. For all noise levels, the algebraic method yields significantly better results than the 
standard optimization-based approach, over all dimensionalities. In Figure [4j we see that the 
error level of the ideal regression algorithm decreases with the noise level, converging to the 
exact solution when the noise tends to zero. In contrast, the error of original SSA decreases 
with noise level, reaching a minimum error baseline which it cannot fall below. In particular, 
the algebraic method significantly outperforms SSA for low noise levels, approaching machine 
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precision. At high noise levels, the algebraic method outperforms SSA on average, having 
lower error variance than SSA. 

5. Conclusion 

In this paper, we have presented the framework of ideal regression and an estimator which 
uses approximate computational algebra. Moreover, we have worked through a specific ex- 
ample: we have shown that the problem of finding common projections of marginals can be 
reformulated in terms of ideal regression and we have derived a practical algorithm, that 
we have evaluated in numerical simulations. Also, due to the algebraic formulation of the 
problem, we were able to derive previously unapproachable theoretical results on the esti- 
mation problem. We argue for a cross-fertilization of machine learning and approximate 
computational algebra: the former can benefit from the wealth of techniques for dealing with 
uncertainty and noisy data; the machine learning community may find in ideal regression a 
novel framework for representing learning problems and powerful proof techniques. 
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